Goto

Collaborating Authors

 context unit


Context Dependence and Reliability in Autoregressive Language Models

Sengupta, Poushali, Pandey, Shashi Raj, Maharjan, Sabita, Eliassen, Frank

arXiv.org Machine Learning

Large language models (LLMs) generate outputs by utilizing extensive context, which often includes redundant information from prompts, retrieved passages, and interaction history. In critical applications, it is vital to identify which context elements actually influence the output, as standard explanation methods struggle with redundancy and overlapping context. Minor changes in input can lead to unpredictable shifts in attribution scores, undermining interpretability and raising concerns about risks like prompt injection. This work addresses the challenge of distinguishing essential context elements from correlated ones. We introduce RISE (Redundancy-Insensitive Scoring of Explanation), a method that quantifies the unique influence of each input relative to others, minimizing the impact of redundancies and providing clearer, stable attributions. Experiments demonstrate that RISE offers more robust explanations than traditional methods, emphasizing the importance of conditional information for trustworthy LLM explanations and monitoring.


Towards human-like spoken dialogue generation between AI agents from written dialogue

Mitsui, Kentaro, Hono, Yukiya, Sawada, Kei

arXiv.org Artificial Intelligence

The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents. However, generating human-like spoken dialogues from these written dialogues remains challenging. Spoken dialogues have several unique characteristics: they frequently include backchannels and laughter, and the smoothness of turn-taking significantly influences the fluidity of conversation. This study proposes CHATS -- CHatty Agents Text-to-Speech -- a discrete token-based system designed to generate spoken dialogues based on written dialogues. Our system can generate speech for both the speaker side and the listener side simultaneously, using only the transcription from the speaker side, which eliminates the need for transcriptions of backchannels or laughter. Moreover, CHATS facilitates natural turn-taking; it determines the appropriate duration of silence after each utterance in the absence of overlap, and it initiates the generation of overlapping speech based on the phoneme sequence of the next utterance in case of overlap. Experimental evaluations indicate that CHATS outperforms the text-to-speech baseline, producing spoken dialogues that are more interactive and fluid while retaining clarity and intelligibility. Large Language Models (LLMs) have profoundly influenced the field of natural language processing (NLP) and artificial intelligence (AI) (Zhao et al., 2023). LLMs, with their capacity to generate coherent and contextually relevant content, have enabled more natural text-based dialogues between humans and computers and paved the way for inter-computer communication. The recently proposed concept of Generative Agents (Park et al., 2023) underscores the potential of LLMs, where emulated agents within the model engage in autonomous dialogues, store information, and initiate actions. This emerging paradigm of agent-to-agent communication offers vast potential across various sectors, from entertainment to facilitating human-to-human information exchange. However, considering the dominance of spoken communication in human interactions, integrating voice into machine dialogues can provide a richer expression of individuality and emotion, offering a more genuine experience.


Recognition of Visually Perceived Compositional Human Actions by Multiple Spatio-Temporal Scales Recurrent Neural Networks

Lee, Haanvid, Jung, Minju, Tani, Jun

arXiv.org Artificial Intelligence

Abstract--The current paper proposes a novel neural network model for recognizing visually perceived human actions. The proposed multiple spatiotemporal scales recurrent neural network (MSTRNN) model is derived by introducing multiple timescale recurrent dynamics to the conventional convolutional neural network model. One of the essential characteristics of the MSTRNN is that its architecture imposes both spatial and temporal constraints simultaneously on the neural activity which vary in multiple scales among different layers. As suggested by the principle of the upward and downward causation, it is assumed that the network can develop meaningful structures such as functional hierarchy by taking advantage of such constraints during the course of learning. T o evaluate the characteristics of the model, the current study uses three types of human action video dataset consisting of different types of primitive actions and different levels of compositionality on them. The performance of the MSTRNN in testing with these dataset is compared with the ones by other representative deep learning models used in the field. The analysis of the internal representation obtained through the learning with the dataset clarifies what sorts of functional hierarchy can be developed by extracting the essential compositionality underlying the dataset. ECENTL Y, a convolutional neural network (CNN) [1], inspired by a mammalian visual cortex, showed a remarkably better object image recognition performance than conventional vision recognition schemes which employ elaborately hand-coded visual features. A CNN trained with 1 million visual images from ImageNet [2] was able to classify hundreds of object images with an error rate of 6.67% [3], and demonstrated near-human performance [4]. As a consequence, CNNs are less effective in handling video image patterns than static images. To address this shortcoming, a number of action recognition models have been developed. H. Lee is with the Department of Electrical Engineering, Korea Institute of Science and Technology, Daejeon 305-701, Republic of Korea, email: (haanvidlee@gmail.com). M. Jung is with the Department of Electrical Engineering, Korea Institute of Science and Technology, Daejeon 305-701, Republic of Korea, email: (minju5436@gmail.com).


Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

Pashler, Harold, Cepeda, Nicholas, Lindsey, Robert V., Vul, Ed, Mozer, Michael C.

Neural Information Processing Systems

When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCMs prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, Shadmehr, 2007), yet MCM is better able to account for human declarative memory.


Information Factorization in Connectionist Models of Perception

Movellan, Javier R., McClelland, James L.

Neural Information Processing Systems

We examine a psychophysical law that describes the influence of stimulus and context on perception. According to this law choice probability ratios factorize into components independently controlled bystimulus and context. It has been argued that this pattern of results is incompatible with feedback models of perception. In this paper we examine this claim using neural network models defined via stochastic differential equations. We show that the law is related to a condition named channel separability and has little to do with the existence of feedback connections. In essence, channels areseparable if they converge into the response units without direct lateral connections to other channels and if their sensors are not directly contaminated by external inputs to the other channels. Implicationsof the analysis for cognitive and computational neurosicence are discussed.


A Dynamical Systems Approach for a Learnable Autonomous Robot

Tani, Jun, Fukumura, Naohiro

Neural Information Processing Systems

This paper discusses how a robot can learn goal-directed navigation tasksusing local sensory inputs. The emphasis is that such learning tasks could be formulated as an embedding problem of dynamical systems: desired trajectories in a task space should be embedded into an adequate sensory-based internal state space so that an unique mapping from the internal state space to the motor command could be established. The paper shows that a recurrent neural network suffices in self-organizing such an adequate internal state space from the temporal sensory input.


A Dynamical Systems Approach for a Learnable Autonomous Robot

Tani, Jun, Fukumura, Naohiro

Neural Information Processing Systems

This paper discusses how a robot can learn goal-directed navigation tasks using local sensory inputs. The emphasis is that such learning tasks could be formulated as an embedding problem of dynamical systems: desired trajectories in a task space should be embedded into an adequate sensory-based internal state space so that an unique mapping from the internal state space to the motor command could be established. The paper shows that a recurrent neural network suffices in self-organizing such an adequate internal state space from the temporal sensory input.


A Dynamical Systems Approach for a Learnable Autonomous Robot

Tani, Jun, Fukumura, Naohiro

Neural Information Processing Systems

This paper discusses how a robot can learn goal-directed navigation tasks using local sensory inputs. The emphasis is that such learning tasks could be formulated as an embedding problem of dynamical systems: desired trajectories in a task space should be embedded into an adequate sensory-based internal state space so that an unique mapping from the internal state space to the motor command could be established. The paper shows that a recurrent neural network suffices in self-organizing such an adequate internal state space from the temporal sensory input.


Constructive Learning Using Internal Representation Conflicts

Leerink, Laurens R., Jabri, Marwan A.

Neural Information Processing Systems

The first class of network adaptation algorithms start out with a redundant architecture and proceed by pruning away seemingly unimportant weights (Sietsma and Dow, 1988; Le Cun et aI, 1990). A second class of algorithms starts off with a sparse architecture and grows the network to the complexity required by the problem. Several algorithms have been proposed for growing feedforward networks. The upstart algorithm of Frean (1990) and the cascade-correlation algorithm of Fahlman (1990) are examples of this approach.


Constructive Learning Using Internal Representation Conflicts

Leerink, Laurens R., Jabri, Marwan A.

Neural Information Processing Systems

The first class of network adaptation algorithms start out with a redundant architecture and proceed by pruning away seemingly unimportant weights (Sietsma and Dow, 1988; Le Cun et aI, 1990). A second class of algorithms starts off with a sparse architecture and grows the network to the complexity required by the problem. Several algorithms have been proposed for growing feedforward networks. The upstart algorithm of Frean (1990) and the cascade-correlation algorithm of Fahlman (1990) are examples of this approach.